Model selection by resampling penalization
نویسندگان
چکیده
منابع مشابه
Model selection by resampling penalization
We present a new family of model selection algorithms based on the resampling heuristics. It can be used in several frameworks, do not require any knowledge about the unknown law of the data, and may be seen as a generalization of local Rademacher complexities and V fold cross-validation. In the case example of least-square regression on histograms, we prove oracle inequalities, and that these ...
متن کاملModel selection using Rademacher Penalization
In this paper we describe the use of Rademacher penalization for model selection. As in Vapnik's Guaranteed Risk Minimization (GRM), Rademacher penalization attemps to balance the complexity of the model with its t to the data by minimizing the sum of the training error and a penalty term, which is an upper bound on the absolute di erence between the training error and the generalization error....
متن کاملA Penalization Criterion Based on Noise Behaviour for Model Selection
Complexity-penalization strategies are one way to decide on the most appropriate network size in order to address the trade-off between overfitted and underfitted models. In this paper we propose a new penalty term derived from the behaviour of candidate models under noisy conditions that seems to be much more robust against catastrophic overfitting errors that standard techniques. This strateg...
متن کاملResampling methods for model fitting and model selection.
Resampling procedures for fitting models and model selection are considered in this article. Nonparametric goodness-of-fit statistics are generally based on the empirical distribution function. The distribution-free property of these statistics does not hold in the multivariate case or when some of the parameters are estimated. Bootstrap methods to estimate the underlying distributions are disc...
متن کاملTranslation Model Adaptation by Resampling
The translation model of statistical machine translation systems is trained on parallel data coming from various sources and domains. These corpora are usually concatenated, word alignments are calculated and phrases are extracted. This means that the corpora are not weighted according to their importance to the domain of the translation task. This is in contrast to the training of the language...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronic Journal of Statistics
سال: 2009
ISSN: 1935-7524
DOI: 10.1214/08-ejs196